Aggregate and mixed-order Markov models for statistical language processing
نویسندگان
چکیده
We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are classbased bigram models in which the mapping from words to classes is probabilistic. Mixed-order Markov models combine bigram models whose predictions are conditioned on different words. Both types of models are trained by ExpectationMaximization (EM) algorithms for maximum likelihood estimation. We examine smoothing procedures in which these models are interposed between different order n-grams. This is found to significantly reduce the perplexity of unseen word combinations.
منابع مشابه
Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes
Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded DNA virus. There were two approaches for prediction of each Markov Model parameter,...
متن کاملEfficient mixed-order hidden Markov model inference
Recent studies have shown that high-order hidden Markov models (HMMs) are feasible and useful for spoken language processing. This paper extends the fixed-order versions to ergodic mixedorder HMMs, which allow the modelling of variable-length contexts with significantly less parameters. A novel training procedure automatically infers the number of states and the topology of the HMM from the tra...
متن کاملAn Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set
Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...
متن کاملMarkov Chain Analogue Year Daily Rainfall Model and Pricing of Rainfall Derivatives
In this study we model the daily rainfall occurrence using Markov Chain Analogue Yearmodel (MCAYM) and the intensity or amount of daily rainfall using three different probability distributions; gamma, exponential and mixed exponential distributions. Combining the occurrence and intensity model we obtain Markov Chain Analogue Year gamma model (MCAYGM), Markov Chain Analogue Year exponentia...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cmp-lg/9706007 شماره
صفحات -
تاریخ انتشار 1997